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ABSTRACT 

Let  (6^,x^) , . . . , be  independent  and  identically  dis¬ 
tributed  random  vectors  with  E(x|9)  =  0  and  Var(x|9)  * 
p 

a  +  b0  +  c0  .  Let  t^  be  the  linear  Bayes  estimator  of  0_^  and 
9.  be  the  linear  empirical  Bayes  estimator  of  0^  as  proposed  in 
Robbins  (1983),  when  Ex  and  Var  x  are  unknown  to  the  statisti¬ 
cian.  The  regret  of  using  0.  instead  of  t.  because  of  ignor- 

1  ~  2  2 
ance  of  the  mean  and  the  variance  is  r^  =  E(0^  -  0^)  -E(t^-0^)  . 

Under  appropriate  conditions  cumulative  regret  R^  =  r^+...+  r^  is 

shown  to  have  a  finite  limit  even  when  n  tends  to  infinity.  The 

limit  can  be  explicitly  computed  in  terms  of  a,b,c  and  the  first 

four  moments  of  x. 
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INTRODUCTION  AND  SUMMARY 


In  the  first  Jerzy  Neyman  Memorial  Lecture,  Robbins  (1983) 
has  outlined  a  wide  class  of  problems  concerning  the  general  empi¬ 
rical  Bayes  approach  and  the  linear  empirical  Bayes  approach  to 
estimation.  In  this  paper  we  shall  study  a  special  case  which  in¬ 
cludes  several  important  standard  distributions.  Specifically  let 
(0,x)  be  a  random  vector  such  that  0  has  a  distribution  function 
G,  and  the  conditional  expectation  of  x  given  0  satisfies 

E (x  /Q)  =  e.  (1.1) 

Suppose  it  is  desired  to  use  a  linear  function  A  +  Bx  of  the  ob¬ 
served  x  to  estimate  the  unknown  parameter  6.  If  the  loss 
function  is  squared  error,  the  best  linear  estimate  is 

t(x)  =  E0  +  (x-Ex)  .  (1.2) 


and  the  mean  squared  error  is 


E(tV  -  v«  e  -  “s> 


(1.3) 


Assume,  in  addition  to  (1.1),  that 


Var(x|0)  =  a  +  b0  +  c0 


(1.4) 


for  some  known  constants  a,b,  and  c.  Then  (1.2)  can  be  written  as 


i  \  „  .  c  Var  x+a+b  Ex+cE  x,  ,  r.  ,  ,,  cv 

t (x)  =  Ex  +  (1 - 7— TT TT~~ - )  (x-Ex)  (1.5) 


(c+1)  Var  x 


which  is  computable  if  Ex  and  Var  x  are  known. 

We  shall  be  dealing  with  the  case  when  Ex  and  Var  x  are  un¬ 
known.  However  we  are  faced  with  a  large  number  n  of  independent 

versions  of  the  component  problem:  (0, ,x, ),..., (0  ,x  )  are 

I  x  n  n 

independent  random  vectors  having  the  same  distributions  as  (0,x). 
Robbins  (1983)  has  proposed  to  estimate  Ex  and  Var  x  respectively 
by 

-  1  n  7  1  n  -  7 

x  =  —  7  x.,  and  s  =  — p  7  (x.-x)  ,  (1.6) 

n  “  1  n-1  |  1 

2  -  -2 

and  use  the  following  statistic  (with  h  =  cs  +  a  +  bx  +  cx“). 


to  estimate  0  ,  for  each  i=l,...,n  .  He  has  also  hoped  that 
under  some  mild  restrictions  on  the  nature  of  G,  with  some  reason¬ 
able  rapidity  as  n  tends  to  infinity 


E(0.  '  9i)2  E(t  "  9)2 


(1.8) 


We  shall  assume  that  the  best  linear  estimate  (1.2)  is  also 
the  best  general  estimate  E(ejx).  This  assumption  will  reduce 
the  class  of  possible  distributions  for  6.  For  instance,  if  x 
has  a  distribution  from  an  exponential  family  with  parameter  6, 
then  the  above  assumption  will  limit  the  class  of  the  prior  dis¬ 
tributions  to  the  conjugate  family.  See  Diaconis  and  Ylvisaker 
(1979).  However,  even  this  special  case  will  be  wide  enough  to 
include  many  standard  distributions  used  in  practice.  In  this 
case  we  shall  verify  (1.8).  Indeed  we  shall  consider  the 
cumulative  regret 


Rn  =  l  (E(0.-e.)2  -  E(t(x.)  -  e.)2) 


(1.9) 


of  using  0^  instead  of  t(x^)  because  of  the  statistician's  ignor¬ 
ance  of  Ex  and  Var  x.  It  can  be  shown  that  even  as  n  goes  to 
infinity  remains  bounded  so  that  (1.8)  will  hold.  We  summar¬ 

ize  the  main  results  in  the  following  theorem  and  leave  the  proof 
to  the  next  section. 

Theorem  I.  Let  (0,x),  (0^,x^),  be  independent  and  identically 

distributed  nondegenerate  random  vectors  such  that 

(i)  E(x|@)  =  0,  Var(xj0)  =  a+b0+c02. 

(ii)  E(8|x)  =  E0  +  (x-Ex)  Cov(0,x)/Var  x. 


For  each  n  =  2 


*  •  •  •  » 


Then 


and  for  each  i=l,2,...,n,  define 
n  n  n  t  n 


1  ,2  2  1  c,  -.2 

=  x  =  —  )  x.  and  s  =  s  =  — r  L  (x.-x)  , 
n  n  ,  l  n  n-1  ,  l  ’ 


,  ,,  c  Var  x+a+b  Ex+cE  x,  ,  „  N 

h  ’  Ex  +  (1 - (cH-1)  Var  "x - >  <  VEx)  > 


2  -  -2 

~  -  .  cs  +a+bx+cx  ,+  , 

8.  =  x  +  (1 - = - )  (x.-x),  and 

1  /  .  i  \  ^  1 


(c+1) s 


R  =  l  (E(9.  -  8 .  )2  -  E(t  -0.)2). 
n  J  li  li 


d.io: 


lim  R  =  - +  - x—r 

n  .  ,,.22  ,  ,,.26 

n-*=°  (c+1)  y  (c+1)  Y 

2  3  4 

where  y  =  Ex,  y  =  Var  x,  y^  =  E(x-y)  ,  y^  =  E(x-y)  , 

2  2 
H  =  cy  +  a  +  by  +  cy  ,  and 

a2  =  (y4  -  Y4) (a+by+cy2)2  +  Y6(b+2cy)2 

-  2Y2y3( a+by+cy2) (b+2cu) 


(1.11 


(1.12 


For  the  special  case  when  b  = c  = 0,  a  slightly  more  general 
result  can  be  established  under  weaker  conditions.  The  result  is 
in  Theorem  2. 


Theorem  2.  Let  (6^,x^),  (S^.x  ),  ...  be  independent  random  vec¬ 
tors  satisfying  the  following  conditions: 

(A)  For  all  i, 

2 

(i)  Ex^  =  y  and  Var  x^  =  y  >0. 

(ii)  E(x^|8^)  =  8^  and  Var(x^|8^)  =  a  <  y2. 

(iii)  E(0i|x^)  =  E0i  +  (x^-y)  Cov( 8^ ,x^) /y2  . 

2 

(B)  (iv)  {(x^-y)”,  i  >  1}  satisfies  the  Lindeberg 

condition . 


V  z 

iv;  ~  l  (x  -y)  converges  in  probability  to  y", 
1  1 

1  n  2 

and  lim  —  £  Var (x.-y;  is  finite, 
n  1  l 

For  each  n  =  2,...,  and  each  i=l,2,...,n,  put 

t  =  Ex.  +  (1-  a-  )(x.  -  Ex.), 

i  l  Var  x .  i  i 

l 

6i  =  x  +  (1 --y)  +  (xi-x) ,  and 
s 

Rn  -  I  (E(0.-e.)2  -  E(t.-6.)2). 


where 


2  2 

lim  R  =  At  +  2-  k, 
n~>  n  Y2  Y6 


K  =  lim  —  V  ,  and  V  =  j  Var(x.-y)  . 
_  n  n’  n  V  i 

n-x»  1 


(1.14) 


2.  PROOF  OF  THE  MAIN  RESULTS 


We  need  some  preliminary  results  for  Theorem  1. 


Lemma  1 .  Let  x,x  ,x„,...,  be  independent  and  identically  distri 
1  1  4 

buted  random  variables  with  Ex  <  Let  the  following  notation 

2  3  4 

be  used:  y  =  Ex,  y  =  Var  x,  y^  =  E(x-y)  and  y^  =  E(x-y)  . 

For  each  n  >  1,  put 

Wln  =  “  (  l  (VU)2  ■  y2)>  (2.1) 

vn  1 

l  ^  o  y 

w  =  —  (  l  y(x  -y)  -  Y  X  ),  and 

2n  r~  "  l  1 

/n  1 

W3n  =  ^  (x“  ~  P2)  • 

Then  as  n  tends  to  infinitv,  (W.  ,W„,  ,W_  )  converges  in  distri 

In  2n  3n 

bution  to  a  multivariate  normal  distribution  with  mean  0  and 


covariance  matrix  X  =  (a..),  where 

Y  xj 

aii  =  U4  "  y4,  a22  =  y2(y4  “y4)  +  y6  "2Y2MU3,  (2.2) 

°33  =  4y2y2‘  ai2  =  ~  Y2y3, 


2  4 

a  =  2pM3,  and  =  2y  y3  -  2yy  . 

The  proof  of  the  Lemma  is  straightforward  and  is  omitted. 

Corollary  1.  Under  the  same  conditions  as  Lemma  1,  as  n  -► 

2  2 

(a+cy  )W  +  bW^  ~  cy  W  has  an  asymptotic  normal  distribution 
with  mean  0  and  variance 


9  A  9  °  A  9 

a  =  (y4-y  ) (a+by+cy  )“  +  y  (b+2cy) 

-  2y2y3( a+bu+cy2) (b+2cy) . 


(2.3) 


Lemma  2.  Let  x.x^,...,  be  independent  and  identically  distri¬ 
buted  random  variables  with  mean  y  and  variance 


2  2 
y  =  a  +  bEx  +  cE  x  +  d, 


(2.4) 


where  a,b,c  and  d  are  constants  and  d  >  0.  Assume  Ex  <  °°. 
For  each  n  2  2,  put 


1  n  2  1  n  ; 

x  =  —  y  x.  and  s  =  — —  T  (x  -x) 
n  “  1  n-1  r  i 


(2.5) 


Then  as  n  tends  to  infinity 

2  -  -2 
nP  [  s  ^  a+bx+cx  ]  ->-0. 


(2.6) 


Proof:  Choose  6  >  0,  such  that  e  *  d- I c+1 | 6- I b+2yc U 6  >  0 


2  -  -2 

P[s  ^  a+bx+cx  ] 

2  2  -  -2  2  -  2 

<  P[Z((Xj,-y)  -  y  )  <  na+nbx+ncx  -  ny  +n(x-y)  ] 

<  P[Z((x.-y)2  -  y2)  <  -en,  (x-y)2  <  6j 

+  P[ (x-y)2  >  6]  . 


(2.7) 


Let  B  =  (Z((x^-p)  -  y)<  -£n}.  Then  the  first  term  in  (2.7)  is 

less  than  or  equal  to 


P[|K£x.-y)2  -  y2)!  >  en] 


(2.8) 


which  is  o(l/n)  as  n  tends  to  infinity  by  the  uniform  integra- 

2  2  2  A 

bilitv  of  {  |  (£<£x  -yi)  -  Y  j)  /n,  n  >  1}  implied  by  Ex  <  00 .  For 

the  second  term  in  (2.7) 

P[(x-p)2  >  5]  (2.9) 

<-yi  (n(E(x-p)4 -Y4)  +  (3n2-2n)y4) 

6  n 

which  is  o(l/n)  as  n  tends  to  infinity.  This  concludes  the 
proof  of  Lemma  2. 

Lemma  3.  Let  x.x^...  be  independent  and  identically  distributed 

-  i  2 

random  variables  with  mean  y  and  variance  Y  •  Assume  that 
Ex  <  <»,  For  each  n  >  2,  put 


1  V  ,  2  1  v  t  “A2 

=  —  J  x.  and  s  =  ~r  l  (x.-x)  . 
n  |  i  n-1  |  i 


(2.10) 


Then  the  following  families  of  random  variables  are  uniformly 


integrable : 


(i)  (ns2(s2-Y2)2,  n  >  2), 

(ii)  (ns2(x-y)2,  n  >  2}, 

(iii)  (ns2 (x“-y2) 2 ,  n  >  2}. 


(2.11) 


Proof.  We  shall  verify  (i)  and  (iii).  The  verification  for  (ii) 
is  entirely  analogous  and  hence  omitted.  For  (i)» 


for  some  constant  K. 


Since  Ex  <  00 ,  it  is  clear  the  four  terms  on  the  bottom  line  of 
(2.12)  are  uniformly  integrable.  For  (iii) ,  for  any  event  D  with 
P[D]  small. 


„  2  -2  2,2 

Ens  (x  -y  )  I 


(2.13) 


S  K(E(- 


Z(x.-y)' 

i 


,  3  _  ,1/3  -2  2  2, 3/2  ,2/3 

-)  ID>  (E(n(x  -y  )  )  I  ) 


which  can  be  made  small  uniformly  in  n. 

Now  we  are  ready  to  give  the  proofs.  That  the  convergence 
in  distribution  of  random  variables  together  with  uniform  inte- 
grability  implies  moment  convergence  is  used.  For  a  reference, 
see  (Chow  and  Teicher  (1978),  Section  8.1). 


Proof  of  Theorem  1.  Let 


2  -  -2 

h  =  cs  +  a+bx+cx  . 


(2.14) 


From  the  identity 


(e.-e.)2  -  (t.-6.)2 

11  11 


(2.15) 


=  (e.-t.r  +  2(e.-t.)(ti-ei)  ; 


taking  expectation  and  summation  and  by  assumptions  (i)  and  (ii) 
we  have  the  cumulative  regret 

H2  .  H  h  ,2  -,2 

R  =  - j-j-  +  E( - j - ~ )  Z(x  -x)  (2.16) 

n  (c+1)  y“  (c+l)y  maxOi,(c+l)s  ) 

2  2 

As  n  tends  to  infinity  s  and  h  will  go  to  Y  anc^  H  respec¬ 
tively  with  probability  one.  Since,  (see  Robbins  (1983)), 

Y2  =  -4r  +  Var  0,  (2.17) 

'  c+1 

and  0  is  nondegenerate,  asymptotically  the  term  inside  the  expec¬ 
tation  sign  in  (2.16)  is  equivalent  to 


n  (s  H-Y  h) 

,  ,n2  4  2 

(c+1)  y  s 


(2.18) 


=  - -7—5  ((a+cy2)  «^n(s2-y2)  +  b*^T(us2-y2x)  -  cy2t/n(x2-y2) )  2 

(c+1)  y  s 

,  ,  l((x.-y)2-y2)  Iy(x  -y)2-y2x 

=  - (<*+=/)  - 1 -  +  b  - - - - 

(c+1)  y  \/n  Jn 

-  cy2^(x2-y2)  +  o  (l))2  . 

P 

2 

By  Corollary  1  and  (2.18),  (y^  denoting  chi-squared  with  1  d.f.), 


T  =  ( - - h -  _)2  Z(x  -x)2 

(c+l)y  max(h,(c+l)s  )  1 


(2.29) 


converges  in  distribution  to 


2~ x^»  where  a  is  defined 


(c+1)  y  x 

in  (2.3)  Next  we  shall  show  that  (T,n  >2}  is  uniformly  inte- 
grable  so  that  as  n  tends  to  infinity 


(c+1)2y6  * 


(2.20) 


Let  A  be  the  event  {(c+l)s  £  h},  then  for  some  positive  K 


ETI  <  K  /  (na+nbx+ncx2)  dp 


(2.21) 


<  K(a+by+cy2)nP [A]  +  Kbn  / Jx-yjdP+Kcn  /  |x2-y2[dP 

A  A 

<  K(a+by+cy2)nP  [A]  +  Kb/n  ( (En(x-y)  2)  P[A]  )** 


+  Kc/n(En(x2-y2)2P[A]) 2, 

which  is  o(l)  by  Lemma  2.  On  the  complement  of  A, 

(c+1)2Y4 

2 


(2.22) 


2  4 

=  (c+1)  Y  /. _ H_ 

O  V 


h  2 
- =■)  I(x.-x) 


<  (H-h)  E(x^-x)  +  — j  ~Y  ) 

s 

<  4c2 (s2-y2)2Z(xi-x)2  +  4b2(x-y)2Z(x.-x)2 

+  4c2(x2-y2)2I(x.-x)2  +  (c+l)2ns2(s2-Y2)2. 

By  Lemma  3,  the  four  terms  on  the  bottom  line  of  (2.22)  are  uni¬ 
formly  integrable.  Therefore  we  have  the  regret 

2  2 

lim  R  =  - — yr  +  - x— r  .  (2.23) 

n-*»  n  (c+1)  Y  (c+1)  Y 

2 

Remark.  The  expression  for  a  contains  terms  up  to  the  fourth 

2  4 

moment.  Although  it  has  terms  of  the  eighth  order  (e.g.  c  y  y^) , 
the  sixth  moment  assumption  is  to  ensure  the  uniform  integrability 
of  {(—  E( (x ,-y) 2-y2) )  \  n  2  1}  in  Lemma  3.  Nevertheless  it  is 

/ET  1 

reasonable  to  conjecture  that  condition  (iii)  in  Theorem  1  can  be 
4 

replaced  by  Ex  <  00 . 

Proof  of  Theorem  2.  From  the  identity 

(e.-0.)2  -  (V0.)2  =  (Vci)2  (2.24) 

+  2(0i-t.)(ti-0i); 

taking  summation  and  expectation  and  by  assumptions  (i) ,  (ii), 
and  (iii),  and  definition  (1.13)  we  have  the  regret  equal  to 

a2  ,  2  1  1  ^2,.,  -x2  ,0  ocs 

R  *  — x-  +  a  E(— r- - x - )  Z(x.-x)  .  (2.25) 

n  z  z  f  z  N  i 

Y  Y  max(s  ,a) 

2 

Let  v.  =  Var(x.-y)  and  V  =  v,+v.  +...+  v  .  Consider  the  event 
x  l  n  1  2  n 

A  =  {s^  <  a} ,  then 

-  ■  s— IA  <2--6> 

i  max(x  ,a) 

r  2  . 

<  cn  P[s  £  a] 


for  some  positive  constant  c.  And 


(2.27) 


P[s2  <  a]  <  P[E(ix  -y)2-y2)  <  (a-y2)n+n(x-y) 2 ] 

<  P[E(x  -y)2-y2  £  (a-v2)n  +  n(x-u)2,  (x-y)2  <  6] 

+  P[(x-y)2  26]. 

2 

Choose  6  >  0  such  that  a-y  +6  =  -  e  <  0.  Let  B  be 
{ZflX-y)2  -  y2)<  -  en}. 


then 


2  2 

nP[E((x  -y)  -  y  )  <  -  en] 

V  (S((*  ,-u)2-Y2))2 
s  —  /b - V - dp 

e  n  n 


(2.28) 


which  goes  to  zero  as  n  tends  to  infinity  by  conditions  (iv)  and 
(v),  and  Brown's  Theorem  (see  Chow  and  Teicher  (1978),  p.  398). 
Next,  consider 


nP [ (x-y)  2  6] 


(2.29) 


<  E(x-y)4  =  ~y-y  (EVar(x.-y)2  +  (3n2-2n)y4) 

6  6  nJ 

which  goes  to  zero  as  n  tends  to  infinity  by  condition  (v) .  On 
the  complement  of  A,  and  for  any  event  D 


Z{J2- 


—2 — >z  E<vJ>2  LA 

Y  max(s  ,a)  A 


(2.30) 


=  E 


'Z(x.-x)2  -  (n-l)y2 

2~  =“2 
,  y  Zixyx) 


\2 

)  e(x.-x; 


)2  I  I 
c  n 
A 


V  (E(x.-x)2-  (n-l)Y2)2 

<  _ 2 -  f  - i - —  j  X 

_  4  L  V  c  D 

(n-l)aY  n  A 


which  may  be  made  very  small  uniformly  in  n  if  P [ D ]  is  small 
for  the  same  reason  as  in  (2.28).  Therefore  the  family 


(2.31) 


{(_L - L - )2  Kx.-x)2,  n  >  2} 

Y  max ( s  , a ) 

is  uniformly  integrable.  And  bv  condition  (v) ,  as  n  tends  to 
2  2  - 

infinity  s  tends  to  y  and  x  tends  to  y  in  probability. 
By  condition  (iv) ,  as  n  tends  to  infinity 


Z(x.-x)2  -  (n-l)y2 

AT  ■ 

n 


N(0,1)  in  distribution. 


(2.32) 


Hence 


that  is 


E(J_ - L - 

y  max(s  ,a) 


■)  Z(x.-x)  ■*  —r  lim  — 

l  6  n 

Y  n-*=° 


2  2V 

t  „  a  L  a  i  ■  n 

lim  R  =  — —  +  —r  lam  — 
n  2  6  n 

n-*°°  y  Y  n-*00 


(2.33) 


(2. 34) 


Corollary:  Let  (Q^.x^),  (Q^.x^), 


be  independent  and  identi¬ 


cally  distributed  random  vectors  satisfying  condition  (A)  in  the 
4 

Theorem.  If  Ex  <  00  ,  R  and  6.  are  defined  in  the  same  way  as 

n  l 

in  (1.13)  and  (1.14),  then 


i  •  d  a  t-  /X-yN  4 
lim  R  =  — j  E(-— )  . 

n-x»  n  y  Y 


(2.35) 


Example  1.  Suppose  8  has  a  common  normal  distribution  with  mean 
2 

y  and  variance  T  and  given  6,  x  has  a  normal  distribution 

with  mean  0  and  variance  a.  Then  x  has  a  normal  distribution 

2  2 

with  mean  y  and  variance  y  =  a  +  T  .  Obviously  the  conditions 

2  2 

of  the  corollary  hold;  hence  the  regret  R^  is  3a  /(a+T  )+o(l), 
as  n  tends  to  infinity. 

9  in  this  normal  case  is  a  variant  of  the  James-Stein  estimator, 

i 

which  has  been  studied  extensively  in  the  literature.  See  Efron 
and  Morris  (1973). 


Example  2.  Suppose  0  has  an  inverted  gamma  distribution  with 
densitv  function 


g(e>  = 


/J±\  a+i  ^ -jf  fi  >  o 

(e}  r(a)  g  lf  6  0 


if  8  <  0, 


where  3  >  0  and  a  >  6,  and  given  0,  x  has  an  exponential 

2 

distribution  with  mean  8  and  variance  8  .  The  conditions  (i), 
(ii)  and  (iii)  in  Theorem  1  hold  with  a=b=0  and  c=l.  It  can  be 
computed  that,  for  any  0  5  p  <  6; 


p  _  g-  r(p+pr(a-p) 

'  r(a) 

If  the  linear  empirical  Bayes  estimators  are  used,  then  the  cumu¬ 
lative  regret  will  satisfy 

.  .  r  2  (a-1)  (a2-4cH-6)  g2 

lim  R  =  — j -  • 

n-K»  n  a  (a-2)  (a-3)  (a-4) 

Example  3.  Suppose  0  has  a  gamma  distribution  with  the  density 


function 


g  (Q)  = 


Baea-le-g8 


0  5  0 


0  >  0, 


where  a  and  g  are  positive  constants,  and  given  0,  x  has  a 
Poisson  distribution  with  mean  8.  In  this  case,  the  conditions 
(i),  (ii)  and  (iii)  in  Theorem  1  hold  with  a=c=0  and  b=l.  If 
the  linear  empirical  Bayes  estimators  are  used,  then  the  regret 
will  satisfy 

i •  D  3a(B+l)  +  28+3 

X im  K  «  • 

n-*®  n  (3+1) 
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